#long-context LLM29/09/2025
oLLM Lets 100K-Token LLMs Run on 8 GB Consumer GPUs by Offloading Memory to SSDs
'oLLM offloads model weights and KV cache to SSD so you can run very long-context LLMs on a single 8GB GPU, trading storage bandwidth for massive context windows.'